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Abstract. As the size of data storing arrays of disks grows, it becomes vital to protect data against 
double disk failures. A popular method of protection is via the Reed-Solomon (RS) code with two parity 
words. In the present paper we construct alternative examples of linear block codes protecting against two 
erasures. Our construction is based on an abstract notion of cone. Concrete cones are constructed via matrix 
representations of cyclic groups of prime order. In particular, this construction produces EVENODD code. 
Interesting conditions on the prime number arise in our analysis of these codes. At the end, we analyse an 
assembly implementation of the corresponding system on a general purpose processor and compare its write 
and recovery speed with the standard DP-RAID system. 



■ 1. Introduction 

' A typical storage solution targeting a small-to-medium size enterprise is a networked unit with 12 disk 

drives with total capacity of around 20 TB [T31 [3] • The volume of information accumulated and stored by a 
typical small-size information technology company amounts to fifty 100-gigabyte drives. The specified mean 
time between failures (MTBF) for a modern desktop drive is about 500,000 hours [M]. Assuming that such 
an MTBF is actually achieved and that the drives fail independently, the probability of a disk failure in the 
course of a year is 

1 _ e-12/57 „ 0.2. Therefore, even a small company can no longer avoid the necessity of 
_ protecting its data against disk failures. The use of redundant arrays of independent disks (RAIDs) enables 

^ ■ such a protection in a cost efficient manner. 

I To protect an array of N disks against a single disk failure it is sufficient to add one more disk to the 

. array. For every N bits of user data written on N disks of the array, a parity bit equal to an exclusive OR 

(XOR) of these bits is written on the {N + l)-st disk. Binary content of any disk can be then recovered as a 
bitwise XOR of contents of remaining N disks. The corresponding system for storing data and distributing 
parity between disks of the array is referred to as RAID-5 i8j . Today, RAID-5 constitutes the most popular 
. solution for protected storage. 

' As the amount of data stored by humanity on magnetic media grows, the danger of multiple disk failures 

■ within a single array becomes real. Maddock, Hart and Kean argue that for a storage system consisting 
of one hundred 8 + P RAID-5 arrays the rate of failures amounts to losing one array every six months [8]. 
Because of this danger, RAID-5 is currently being replaced with RAID-6, which offers protection against 
double failure of drives within the array. RAID-6 refers to any technique where two strips of redundant data 
are added to the strips of user data, in such a way that all the information can be restored if any two strips 
are lost. 

A number of RAID-6 techniques are known [H [51 [TU].A well-known RAID-6 scheme is based on the 
rate-255/257 Reed-Solomon code [IJ. In this scheme two extra disks are introduced for up to 255 disks of 
data and two parity bytes are computed per 255 data bytes. Hardware implementation of RS-based RAID- 
6 is as simple as operations in F = GF(256), which are byte-based. Addition of bytes is just a bitwise 
XOR. Multiplication of bytes corresponds to multiplication of boolean polynomials modulo an irreducible 
polynomial. Multiplication can be implemented using XOR-gates, AND-gates and shifts. 

Some RAID-6 schemes use only bitwise XOR for the computation of parity bits by exploiting a two- 
dimensional striping of disks of the array. Examples are a proprietary RAID-DP developed by Network 
Appliances and EVENODD . Some other RAID-6 methods use a non-trivial striping and employ only 
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XOR operation for parity calculation and reconstruction. Examples include X-code, ZZS-code and Park-code 



In all the cases mentioned above, the problem dealt with is inventing an error correcting block code capable 
of correcting up to two erasures (we assume that it is always known which disks have failed) . In the present 
paper we describe a general approach to the solution of this problem, which allows one to develop an optimal 
RAID-6 scheme for given technological constraints (e.g. available hardware, the number of disks in the array, 
the required read and write performance). We also consider an assembly implementation of an exemplary 
RAID-6 system built using our method and show that it outperforms the Linux kernel implementation of 
RS-based RAID-6. 

The paper is organised as follows. In Section 2 we discuss RAID-6 in the context of systematic linear 
block codes and construct simple examples of codes capable of correcting two errors in known positions. In 
Section 3 we identify an algebraic structure (cone) common to all such codes and use it to construct RAID-6 
schemes starting with elements of a cyclic group of a prime order. Section 3.3 is of particular interest to 
number theorists where we discuss a new condition on the prime numbers arising in the context of RAID-6 
schemes. In Section 4 we compare encoding and decoding performances of an assembly implementation of 
RAID-6 based on Zn with its RS-based counterpart implemented as a part of Linux kernel. 

Let us comment on the relation of the presented material to other modern research efforts. Section 2 is 
rather standard All original theoretical material of this paper is in Section 3. The notion of a cone is 
somewhat related to a non-singular difference set of Blaum and Roth [5] but there are essential differences 
between them. The cone from a cyclic group of prime order as in Lemma [3.41 1 gives EVENODD code [2^. 
Its extended versions and connections to number theoretic conditions are new. 



Suppose that information to be written on the array of disks is broken into words of length n bits. What 
is the best rate linear block code, which can protect data against the loss of two words? 

Altogether, there are 2^" possible pairs of words. In order to distinguish between them, one needs at least 
2n distinct syndromes. Therefore, any linear block code capable of restoring 2 lost words in known locations 
must have at least 2n parity checks. Suppose the size of the information block is iVn bits or N words. In 
the context of RAID, TV is the number of information disks to be protected against the failure. Then the 
code's block size must be at least (2 + iV)n and the rate is 



This result is intuitively clear: to protect TV information disks against double failure, we need at least 2 
parity disks. Note however, that in order to achieve this optimal rate, the word length n must grow with 
the number of disks TV. Really, the size of parity check matrix is 2n x (TV -I- 2)n. All columns of the matrix 
must be distinct and nonzero. Therefore, (TV -I- 2)n < (2^" — 1), i. e. 



If in particular n = 1, then TV < 1. Therefore, if one wants to protect information written on the disks against 
double failures using just two parity disks, the word size n > 2 is necessary. If n = 2, we get TV < 5. In 
reality, the lower bound on n (or, equivalently, the upper bound on TV) is more severe, as the condition that 
all columns of parity check matrix are distinct leads to a code with minimal distance dmin = 2. However, in 
order to build a code which corrects up to In errors in the known location we need dmin — 2ri. 

In the following subsections we will construct explicit examples of linear codes for RAID-6 for small values 
of n and TV. These examples both guide and illustrate our general construction of RAID-6 codes presented 
in Section 3. 

2.1. Redundant array of four independent disks, which protects against the failure of any two 
disks. We restrict our attention to systematic linear block codes. These are determined by the parity 
matrix. To preserve a backward compatibility with RAID-5 schemes, we require half of the parity bits to be 



[8l[TTl[T2]. 



2. RAID-6 FROM THE VIEWPOINT OF LINEAR BLOCK CODES. 
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(1) p = 



the straight XOR of the information bits. Hence the general form of the parity check matrix for iV = 2 is 

X n X n X n On X n 

H G 0„xn Inxm 

where /„xn and 0„xn are nxn identity and zero matrix correspondingly; G and H are some nxn binary 
matrices. The corresponding parity check equations are 

(2) di + d2 + TTi = and H • di + G • ^2 + 7r2 = 0. 

Here di,d2 are n-bit words written on disks 1 and 2, tti and 7r2 are n-bit parity check words written on 
disks 3 and 4; " •" stands for binary matrix multiplication. 

Matrices G and H defining the code are constrained by the condition that the system of parity check 
equations must have a unique solution with respect to any pair of variables. To determine these constraints 
we need to consider the following particular cases. 

(tti, 7^2) are lost. The system ([2]) always has a unique solution with respect to lost variables: we can compute 

parity bits in terms of information bits. 
((ii,7r2) are lost. The system ^ always has a unique solution with respect to lost variables: compute di in 

terms of tti and c?2 using the first equation of ^ as in RAID-5. Then compute tt2 using the second 

equation. 

{d2,'^2) are lost. The system ([2]) always has a unique solution with respect to lost variables: compute ^2 

using TTi and di as in RAID-5. Then compute tti using ([2|). 
(tti , di ) are lost. The system ([2]) always has a unique solution with respect to lost variables provided the 

matrix H is invertible. 

(7ri,d2) are lost. The system ^ always has a unique solution with respect to lost variables provided the 
matrix G is invertible. 

(^1,^2) ttre lost. The system ([2]) always has a unique solution with respect to lost variables provided the 
matrix ( J is invertible. 

As it turns out, one can build a parity check matrix satisfying all the non-degeneracy requirements listed 
above for n = 2. The simplest choice is 

■ 1 



(3) = /2x2, G = 



1 1 



Non-degeneracy of the three matrices G and ()2.ip is evident. For instanc^^, 

det „ „ =-1 = 1. 

We conclude that the linear block code with a 4 x 8 parity check matrix (Jll [3]) gives rise to RAID-6 
consisting of four disks. The computation of parity dibits tti, 1:2 in the described DP RAID is almost as 
simple as the computation of regular parity bits: Let d\ = (dii,(ii2) and ^2 = (rf2i,'^22) be the dibits to be 
written on disks one and two correspondingly. Then 

TTll = dll + 1^21, 7ri2 = dx2 + ^22, 

(4) 7r21 = dii + d22, 7r22 = d\2 -|- C?21 + ^22- 

The computations involved in the recovery of lost data are bitwise XOR only. As an illustration, let us write 
down expressions for lost data bits in terms of parity bits explicitly: 

<^22 = TTll + 7ri2 -I- 7r21 -|- 7r22 , d\2 — T^ll + T'21 + 7^22, 

dll = TTii -I- 7ri2 -I- 7r22, £^21 = 7ri2 + 7r22- 

It is interesting to note that RAID-6 code described here is equivalent to Network Appliances' horizontal- 
diagonal parity RAID-DP-^*^ with two data disks [7] . Really, diagonal- horizontal parity system for two info 
disks is 

A B HP DPI 
C D HP2 DP2, 



^The reader is aware that — 1 = 1 ^ in characteristic 2 
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where strings {A,C) are written on information disk 1, strings {B,D) are written on disk 2, {HP,HP2) is 
horizontal parity, {DPI, DP2) is diagonal parity. By definition, HP = A + B, HP2 = C + D, DPI = A + D, 
DP2 = B + C + D, which coincides with parity check equations (U). 

On the other hand, the code ([IJ is a reduction of the RS code based on GF(4) which we will describe in 
the next subsection. 

2.2. Redundant array of five independent disks, which protects against the failure of any two 
disks. The code (IT|) can be extended to a scheme providing double protection of user data written on three 
disks 3 , Example 1.1]. The parity check matrix is 

^2x2 -^2x2 ^2x2 ^2x2 02x2 
^2x2 G G^ 02x2 ^2x2, 

where 2x2 matrix G was defined in the previous subsection. The corresponding parity check equations are 

(6) di + d2 + ds + TTi = 0, di + G • d2 + • + TTa = 0. 

The solubility of these equations with respect to any pair of variables from the set {di, (i2, da, tti, 7r2} 
requires two extra conditions of non-degeneracy in addition to non-degeneracy conditions listed in the pre- 

I2x2 -^2x2\„i/^2x2 I2x2 



^ ^ \ ^2x2 G G^ 02x2 ^2 



vious subsection. Namely, matrices [ /^2 ) ^ /^2 ) must be invertible. It is possible 

\ -'2x2 ^ / V ^ / 

to check the invertibility of these matrices via a direct computation. However, in the next section we will 
construct a generalisation of the above example and find an elegant way of proving non-degeneracy. 

The code ([1]) is a reduction of ([5]) corresponding to da = 0. Note also that the code ([S]) is equivalent to 
rate-3/5 Reed-Solomon code based on GF{4): a direct check shows that the set of 2 x 2 matrices 0, 1, G, G^ 
is closed under multiplication and addition and all non-zero matrices are invertible. Thus this set forms a 
field isomorphic to GF(4). On the other hand, as we established in the previous subsection, the code ^ 
is equivalent to RAID-DP-^^^ with four disks. Therefore, RAID-DP-^*^ with four disks is a particular case 
of the RS-based RAID-6. It would be interesting to see if RAID-DP-^*^ can be reduced to the RS-based 
RAID-6 in general. 

We are now ready to formulate general properties of linear block codes suitable for RAID-6 and construct 
a new class of such codes. 

3. RAID-6 BASED ON THE CYCLIC GROUP OF A PRIME ORDER. 

3.1. RAID-6 and cones of GL„(F). In this subsection we will define a general mathematical object 
underlying all existing algebraic RAID-6 schemes. We recall that F = GF{2) is the field of two elements and 
GL„(F) is the set of n x n invertible matrices. 

Definition [mil. A confQ G is a subset of GL„(F) such that g + he GL„(F) for a\\ g^heG. 

This notion is related to non-singular difference sets of Blaum and Roth [3]. The cone satisfies the axioms 
PI and P2 of Blaum and Roth but the final axiom P3 or P3' is too restrictive for our ends. On the other 
hand, we consider only binary codes while Blaum and Roth consider codes over any finite field. 

A standard example of a cone appears in the context of Galois fields. If we choose a basis of GF(2") as 
a vector space over F then we can think of GL„(F) as the group of all F-linear transformations of GF(2"). 
Multiplications by non-zero elements of GF(2") form a cone. If a G GF{2™) is a primitive generator and g 
is the matrix of multiplication by a then this cone is {17™ |0<m<2" — 2}. This cone gives the RS-code 
with two parity words. 

The usefulness of cones for RAID-6 is explained by the following 

Lemma I3.1i 2. Let G ~ {31,52, •■•Sat} Q GL„(F) be a cone of N elements. Then the system of parity 
equations 

N N 

(7) dN+i=^dk and dN+2 = fffc^fc 

fc=l k=l 



^ This terminology is slightly questionable. If one asks g + h a C then C U {0} is a convex cone in the usual mathematical 
sense. Our choice of the term is influenced by this analogy. Non-singular difference set or quasicone or RAID-cone could be 
more appropriate scientifically but would pay a heavy linguistic toll. 
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has a unique solution with respect to any pair of variables (d^, dj) G F" x F", l<i<j<N + 2. Here di 
are binary n- dimensional vectors. 

Proof. The fact that system ([7]) has a unique solution with respect to (djv+i, (ijv+2) is obvious. 

The system has a unique solution with respect to {dM+i, dj) for any j < N: from the second of equations 
([7]), dj — g~^{dpf+2 + J2k^j9kdk), where we used invertibility of gj e GL„(F). With dj known, dN+i can 
be computed from the first of equations ([7]) . 

The system has a unique solution with respect to ((ijv_(-2 , ) for any j < N: from the first of equations 
([7]), dj = dN+i + J2k^j ^k- With dj known, djq+i can be computed from the second of equations ([7]). 

The system has a unique solution with respect to any pair of variables di^dj for 1 < i < j < N: 
multiplying the first of equations ([7]) with gi and adding the first and second equations, we get dj — 

(gi + gj)~^{gidN+i + dAr+2 + X^fe^i j(5fe + 9i)dk)- Here we used the invertibility of the sum gi + gj for any 
i ^ j , which follows from the definition of the cone. With dj known, di can be determined from any of the 
equations ©. QED 

In the context of RAID-6, di for 1 < i < TV can be thought of as rt-bit strings of user data, dM+i, dw+2 
- as n-bit parity strings. The lemma proved above ensures that any two strings can be restored from the 
remaining N strings. 

We conclude that any cone can be used to build RAID-6. The following lemma gives some necessary 
conditions for a cone. 

Lemma 13.11 3. Let C C GL„(F) be a cone. 

(i) For all g, h £ C such that g ^ h and for all x € G-F(2'")", gx = hx if and only if x — 0. 

(ii) No two elements of the same cone can share an eigenvector in F". 
(Hi) The cone C can contain no more than one permutation matrix. 

Proof. To prove (i), assume that there \s x ^ Q : gx — hx. Then {g + h)x — 0, which contradicts the fact 
that g + /i is non-degenerate. Therefore, x — Q. Let us prove (ii) now. As elements of C are non-degenerate, 
the only possible eigenvalue in F is 1, thus for any two elements sharing an eigenvector x, x — hx = gx, 
which again would imply degeneracy of ft. -I- 17 unless x — 0. The statement (iii) follows from (ii) if one notices 
that any two permutation matrices share an eigenvector whose components are all equal to one. QED 

The notion of the cone is convenient for restating well understood conditions for a linear block code to be 
capable of recovering up to two lost words. Our main challenge is to find examples of cones with sufficiently 
many elements, which lead to easily implementable RAID-6 systems. We will now construct a class of cones 
starting with elements of a cyclic subgroup of GL„(F) of a prime order. 

3.2. RAID-6 based on matrix generators of Z^. We start with the following 

Theorem 13.21 1. Let N be an odd number. Let g be an n x n binary matrix such that g^ — Id and 

Id + 5™ is non- degenerate for each prope^ divisor m of N . Then the elements of cyclic group ~ 

{Id, g,g^, . . . , g^~^} form a cone. 

The proof of the Theorem 13.21 1 is based on the following two lemmas. 

Lemma 13.21 2. Let g be a binary matrix such that Id + g is non- degenerate and g^ = Id, where N is an 
integer. Then 

N-l 

(8) E ff' = 

k=0 

Proof. Let us multiply the left hand side of ([5]) with (Id + g) and simplify the result using that h + h ~ 
for any binary matrix: 

N-l 

{Id + g)Y, 9^ = Id + g + g + g"^ + . . . + g^-^ + g^-^ + g^ = Id + g^ = Id + Id = 0. 

k=0 

As Id + g is non-degenerate, this implies that '^^Sq g^ = 0. QED 

Lemma l3.2l 2 is a counterpart of a well-known fact from complex analysis that roots of unity add to zero. 



'a natural number m < N that divides A'^ 
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Lemma 13.21 3. Let g be a binary matrix such that = Id for an odd number N and Id + g™ is non- 
degenerate for every proper divisor m of N . Then the matrix g^ + g^ is non-degenerate for any k,l : < 
k<l<N. 

Proof. As — Id, the matrix g is invertible. To prove the lemma, it is therefore sufficient to check the 
non-degeneracy of Id + g'' for < fc < iV. 

The group = {l,g,g^, . . . ,5^"^} is cychc. An element g^ — g*' for < fc < iV generates the cyclic 
subgroup Zjq/d where d is the the greatest common divisor of N and k and the element g''' generates the 
same subgroup. Since the matrix g'^ satisfies all the conditions of Lemma [3.21 2. the sum of all elements of 
Zjv/d is zero. Therefore, 



N 



1 



(9) = V C = {Id + gk) + glild + g^) + . . . + gj^-\ld + g^) + .gf = 0. 



m— 



The grouping of terms used in ([9]) is possible as N/d is odd. Assume that matrix 1+ gk is degenerate. Then 
there exists a non-zero binary vector x such that (1 + gk)x — 0. Applying both sides of (0) to x we get 
g^. ~^x — g''^^~'^'>x = 0. This contradicts non-degeneracy of g. Thus the non-degeneracy of 1 -|- g*^ is proved 
for all < fc < iV. QED 

The proof of Theorem 13.21 1. The matrix g described in the statement of the theorem satisfies all 
requirements of Lemma [3?2] 3. The statement of the theorem follows from Definition 13. Il l of the cone. QED 

Theorem 13.21 1 allows one to determine whether elements of Zn belong to the same cone by verifying a 
single non-degeneracy conditions imposed on the generator. 

The following corollary of Theorem l3.2l l makes an explicit link between the constructed cone and RAID-6: 

Corollary 13.21 4. Let g be an n x n binary matrix such that g^ — Id for an odd number N and Id -\- g™ 
is non-degenerate for every proper divisor m of N . The systematic linear block code defined by the parity 
check matrix 

Inxn I nxn I nxn ■ ■ ■ I I 

I nxn 9 9 ■ ■ ■ 9 O/nxm 

can recover up to 2 n-bit lost words in known positions. Equivalently, the system of the parity check equations 



P = 



di + d2 + . . . + dN + dw+i = 
(10) di+gd2 + ...+g^-^dN + dN+2^0 

has a unique solution with respect to any pair of variables [di,dj), l<i<j<N-\-2. 

Proof. It follows from Theorem 13.21 1. that the first N powers of g belong to a cone. The statement of 
the corollary is an immediate consequence of Lemma [01 2 for g^ — g*""^, 1 < fc < A^. QED. 

As a simple application of Theorem 13.21 1. let us show that the parity check matrix ([5]) does indeed satisfy 



all non-degeneracy requirements. The matrix G = ^ ^ J J is non degenerate and has order 3. Also, the 

matrix /d + G= ^| ^^is non-degenerate. Hence in virtue of CoroUarv 13.21 4. the parity check matrix 
^ determines a RAID system consisting of five disks, that protects against the failure of any two disks. 

3.3. Extension of ZAr-based cones for certain primes. We will now show that for certain primes, the 
cone constructed in the previous subsection can be extended. The existence of such extensions give some 
curious conditions on a prime number, one of which is new to the best of our knowledge. We start with the 
following 

Lemma l3.3l l. Let N > 2 be a prime number. Then the group ring R = ¥Zn of the cyclic group of order 
N is isomorphic to F © F*^ where F = GF{2'^), d is the smallest positive integer such that 2"* = 1 mod N , 
k = {N-l)/d. 

Proof. By the Chinese Remainder Theorem, R = ®^^Qf[X]/{fj) where - 1 = /o • /i • • • A- is the 
decomposition into irreducible over F polynomials and /o — X — 1. Let a be a root of fj for some j > 0. 
Then d — degfj is the smallest number such that a S GF{2'^) = ¥[X]/{fj). Hence, c? = \ and d is the 
smallest with such property. As X is prime, a is a primitive A^-th root of unity. Hence, X divides 2^^ — 1 
and d is the smallest with such property. 
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It follows that for j > all fj have degree d and ah ¥[X]/{fj) are isomorphic to GF{2'^). QED. 

One case of particular interest is /c = 1 which happens when 2 is a primitive N — 1-th root of unity modulo 
N. This forces k = 1 and d = N — 1. Such primes in the first hundred are 3, 5, 11, 13, 19, 29, 37, 53, 59, 61, 
67, 83 |15j. If a generalised Riemann's hypothesis holds true then there are infinitely many such primes [S]. 

Lemma l3.3l 2. Let N > 2 be a prime number such that 2 is a primitive N — 1-th root of unity modulo N . 
Let g be an n X n binary matrix such that Id + g is non- degenerate and = Ld. Then the set of 2^~^ — 1 
matrices S = {g"^ + .g"^ + . . . + g"* | < ai < 02 . . . < a* < iV, 1 <t < is a cone.} 

Proof. Matrix g defines a ring homomorphism (j) : R ^ Af„(F), akX'') — Ukg'' from the group 

ring to a matrix ring. Since 1 + is invertible, 

l+,g + ,g2 + ...+.9^-i = (l+g^)(l+.9)-i=0 

and 1 + X + . . . + X^-i lies in the kernel of Since i?/(l + X + ... + X^-^) = ¥[X]/{fi) = ¥, the image 
(j>{R) is a field and 5* is a subset of 4>{R)- Finally, as fi = 1 + X + . . . + X^~^ is the minimal polynomial of 
g, all elements g""^ + g°'^ + . . . + 5°* listed above are distinct and nonzero and S = 4>{R) \ {0}. QED. 
Notice that for = 3, the set S consists only of Ld and g. 

The cone S in Lemma 13.31 2. may be difficult to use in a real system but it contains a very convenient 
subcone as soon as iV > 3. This subcone consists of elements and Ld + g^ . The following theorem gives 
a condition on the prime p for these elements to form a cone. This condition is new to the best of our 
knowledge. 

Theorem 13.31 3. The following conditions are equivalent for a prime number N > 2. 

(1) For any n x n binary matrix g such that Ld + g is non-degenerate and g^ = Ld the set of 2N — 1 
matrices S = {Ld, g, g^, . . . , g^^^, Ld + g, Ld + g^ , . . . , Ld + .g^^^} is a cone. 

(2) For no primitive N-th root of unity a in the algebraic closure of¥, the element a + 1 is an N-th root 
of unity. 

(3) For any < m < N the polynomials X^ + 1 and X'" + X + 1 are relatively prime. 

(4 ) No primitive N-th root of unity a in the algebraic closure of¥ satisfies a™ + a' + 1 = with N > m > 
I > 0. 

Proof. First, we observe that (1) is equivalent to (4). If (4) fails, there exists an A^-th root of unity a 
such that a™ + a' + 1 = 0. Let f{X) be the minimal polynomial of a. The matrix g of multiplication by 
the coset of X in ¥[X]/{f) fails condition (1) with + + 1 = 0. 

If (4) holds and g is a matrix as in (1) then the elements of S are all invertible matrices by Theorem l3.2l l. 
Moreover, it only remains to establish that each matrix + + Ld, N > m > I > is invertible. Suppose 
that it is not invertible. It must have an eigenvector w G F" with the zero eigenvalue. It follows that fv{X), 
the minimal polynomial of g with respect to v, divides both X^ + 1 and X™ + + 1. Since 1 is not a 
root of X™ + X'- + 1, any root a of fv{X) in the algebraic closure of F is a primitive N-ih root of unity and 
satisfies a™ + a' + 1 = 0. 

Equivalence of (4) and (3) is clear: [3 = is also a primitive root, hence condition (4) can be rewritten 
as no root /3 satisfies Z?'* + /3 + 1 = with iV > s > 0. Thus, X^ + 1 and X™ + X + 1 do not have common 
roots in the algebraic closure of F and must be relatively prime. 

Equivalence of (3) and (2) comes from rewriting a™ + a + 1 = as a™ — a + l and observing that a™ is 
necessarily a primitive A^-th root of unity. QED. 

This theorem allows us to sort out whether any particular prime N is suitable for extending the cone. 

Corollary 13.31 4. A Fermat prime N > 3 satisfies the conditions of Theorem \3.3i 3. A Mersenne prime 
fails the conditions of Theorem \3.3[ 3. 

Proof. A Fermat prime is of the form = 2'^' + 1. Hence, for a primitive A^th root of unity a 

(a + 1)^ ^{a + l)^\a + 1) = (a^' + i)(q, + 1) ^ + a. 
If this is equal 1, then + a + 1 = 0, forcing A^ = 3. A Mersenne prime is of the form A^ = 2*^ — 1. Hence, 

(a + 1)^ = {a + if {a + 1)-^ = {a"" + l)(a + 1)-^ - (a + l){a + 1)-^ = 1. 

QED. 
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In fact, most of the primes appear to satisfy the conditions of Theorem l3.3l 3. In the first 500 primes, the 
only primes that fail are Mersenne and 73. Samir Siksek has found several more primes that fail but are not 
Mersenne. These are (in the bracket we state the order of 2 in the multiplicative group of GF{p)) 73 (9) 
178481 (23), 262657 (27), 599479 (33), 616318177 (37), 121369 (39), 164511353 (41), 4432676798593 (49), 
3203431780337 (59), 145295143558111 (65), 761838257287 (67), 10052678938039 (69), 9361973132609 (73), 
581283643249112959 (77). It would be interesting to know whether there are infinitely many primes failing 
the conditions of Theorem l3.3l 3. 

Utilising the cone in Theorem l3.3l 3.. we start with a matrix generator of the cyclic group of an appropriate 
prime order N to build a RAID-6 system protecting up to 2N — 1 information disks. The explicit expression 
for Q-parity is 



N-l 



2N-2 



(11) = ^ g'^dk + ^ (/d + /-^+^)4, 

fe=0 k=N 

where do,di^ . . . , d2N-2 are information words. 

3.4. Specific examples of matrix generators of Zjv and tlie corresponding RAID-6 systems. Now 

we are ready to construct explicit examples of RAID-6 based on the theory of cones developed in the above 
subsections. The non-extended code, based on the Sylvester matrix, is known as EVENODD code 



Lemma [3T4ll. Let Sn be the {N 



1) X {N 
( 



- 1 




1 



I Sylvester matrix, 
• • . 1 \ 







1 



1 




\ • ■ 1 1 / 

Then 

(i) Sn has order N. 

(a) Matrix Id + Sn is non-degenerate if N is odd and is degenerate if N is even. 

Proof, (i) An explicit computation shows, that for any {N — l)-dimensional binary vector x and for any 
1 < fc < A^, 

/ XN-l \ f Xk \ 

Xk 
Xk 



(12) 



Xn~2 
Xn-3 



( Xk-l \ 
Xk-2 



Xl 



xn-1 



\ Xl J \ Xk J \ Xk+1 J 

In the above formula Xj = 0, unless 1 < j < {N — 1). Therefore, S'^ ^ Id, for any 1 < fc < — 1. Setting 
A: = A^ in the above formula, we get S^x = x for any x, which implies that Sf^ = Id. Therefore, the order 
of the matrix Sn is A^. 

(ii) The characteristic polynomial of Sn is f{x) — J2k=o order to prove this it is sufhcient to 

notice that the matrix Sn is the companion matrix of the polynomial f{x) |6j. As such, /(x) is both the 
characteristic and the minimal polynomial of the matrix Sn.) Therefore, 

HSn) = ^ 5^ = 0. 

fc=0 

Notice that the matrix Sn is non-degenerate as it has a positive order. If A^ is odd, we can re-write the 
characteristic polynomial as 

fiSN) - {Id + Sn){1 + Sn + S% + ... + S^j^-^^) + S^-' 
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Therefore, the degeneracy of Id + Sn will contradict the non-degeneracy of Sn- If N is even, the sum of all 
rows of Id + Sn is zero, which implies degeneracy. QED 

Lemma 13.41 1 states that the matrix Sn generates the cyclic group Zn and that the matrix Id + Sn is 
non-degenerate for any odd N. Given that N is an odd prime, Corollary 13.21 4 implies that using parity 
equations ((TU)) with g = Sn, it is possible to protect N data disks against the failure of any two disks. 
Furthermore, if iV > 3 is a Fermat prime or 2 is a primitive root modulo N, 2N — 1 data disks can be 
protected against double failure thanks to the results of section 13.11 

We will refer to the RAID-6 system based on Sylvester matrix Sn as Z^r-RAID. Let us give several 
examples of such systems. 

(1) Z3-RAID has been considered in subsections 12. 1[ 12.21 It can protect up to 3 information disks 
against double failure. As = 3, protection of 5 information disks using extended Q-parity ((TT|) is 
impossible. 

(2) Using Z17-RAID, one can protect up to A^ = 17 disks using Q-parity ([T0|) and up to 2A^ — 1 = 33 
disks using extended Q-parity (fTT|) . 

(3) Using Z257-RAID, one can protect up to A^ = 257 disks using Q-parity pll|) and up to 2A^ — 1 = 513 
disks using extended Q-parity pT|) . 

It can be seen from (fT2|) . that the multiplication of data vectors with any power of the Sylvester matrix 
Sn requires one left and one right shift, one n-bit XOR and one AND only. Thus the operations of updating 
Q-parity and recovering data within Zjy-RAID does not require any special instructions, such as Galois 
field look-up tables for logarithms and products. As a result, the implementation of Zat-RAID can in some 
cases be more efficient and quick than the implementation of the more conventional Reed-Solomon based 
RAID-6. In the next section we will demonstrate the advantage of Z^r-RAID using an example of Linux 
kernel implementation of Z17-RAID system. 

4. Linux Kernel Zn-RAID Implementation 

4.1. Syndrome Calculation for the Reed-Solomon RAID-6. First, let us briefly recall the RAID-6 
scheme based on Reed-Solomon code in the Galois field F, see [T] for more details. Let Dq, . . . , Dn-i be the 
bytes of data from A^ information disks. Then the parity bytes P and Q are computed as follows, using Q 
9 = {02} e F: 

(13) P = Do + Di + ... + Dn^i, Q = Do+gDi + ...+ g^-^DN-i- 

The multiplication hy g — {02} can be viewed as the following matrix multiplication. 



(14) 



yo 




"0 




















1" 




Xq 




X7 


yi 




1 

























Xl 




Xq 


2/2 







1 

















1 




X2 




Xl © X7 


2/3 










1 














1 




X3 




X2 © X-! 


2/4 













1 











1 




X4 




X3 © X7 


2/5 
















1 













X5 




X4 


2/6 



















1 










xe 




X5 
























1 











Xq 



Given (HH), parity equations become similar to (flUl) . Indeed, the element g generates a cyclic group, 
so a 2-error correcting Reed-Solomon code is a partial case of a cone based RAID. However, Zjy-RAID has 
several advantages. For instance, using Sylvester matrices one can achieve a simpler implementation of 
matrix multiplication. 



Algebraically, we use the standard representation in electronics: F = GF(2)[x]/ 1 where the ideal / is generated by 
+ x''' + x'^ + + 1 and g = x + I 
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4.2. Linux Kernel Implementation of Syndrome Calculation. To compute the Q-parity, we rewrite 
(fT3l) as 



0x00) ; 



(15) Q = Do+ g{Di + g{. . . + g{DN-2 + 9Dn-i) ■ ■ •))) 
which requires {N — 1) multiplications by g = {02}. 

The product ?/ of a single byte x and g — {02} can be implemented as follows. 

uint8_t X, y; y = (x « 1) " ((x & 0x80) ? Oxld 

Notice that (x k 0x80) picks out x^ from x, so 

((x & 0x80) ? Oxld : 0x00) 

selects between the two bit patterns 00011101 and 00000000 depending on x-j. Since the carry is discarded 
from (x « 1) , 

(16) (x « 1) = [a;6,X5,a;4,a;3,X2,xi,xo,0]((x & 0x80) ? Oxld : 0x00) = [0, 0, 0, xy, 2:7, xt, 0, xy]. 

We can also implement the multiplication as follows. 

int8_t x, y; y = (x + x) ' (((x < 0) ? Oxff : 0x00) & Oxld); 

Here we treat the values as signed, rather than unsigned. Whilst this implementation appears more 
complex than the first (since it uses addition and comparison), it can efficiently be implemented using 
SIMD instructions on modern processors, such as MMX/SSE/SSE2/AltiVec. 

In particular, we will use the following four SSE2 instructions, which store the result in place of the second 
operand: 

pxor X, y : y = x " y; pand x, y : y = x & y; 

paddb X, y : y = x + y; pcmpgtb x, y : y = (y > x) ? Oxff : 0x00; 

Therefore we can implement a single multiplication with the following pseudo SSE2 assembler code. We 
assume that the variables y and c are initialised as y = and c = Oxld. 



pcmpgtb x, y 
paddb X, X 
pand c, y 
pxor X, y 



y = (x < 0) ? Oxff 
X = X + x; 
y = y & Oxld; 
y = X ~ y; 



0x00; // (x < 0) ? Oxff 
// X + X 

// ((x < 0) ? Oxff : 
// (x + x) " 
// (((x < 0) ? Oxff 



0x00 
0x00) k Oxld 



0x00) & Oxld) 

The comparison operation overwrites the constant stored in y. Therefore, when we implement the 
complete algorithm we must recreate the constant before each multiplication. We can do it as follows. 

pxor y,y :y = y"y; //y'y = o 

Besides the five instruction above we need three other instructions to complete the inner loop of the 
algorithm. They are multiply, fetch a new byte of data D and update the parity variables P and Q: 



(17) 



P^D + P, 



Q ^D + gQ. 



The complete algorithm requires the following eight instructions, 
pxor y,y :y = y~y; // y' Y = 

pcmpgtb q, y : y = (q < 0) ? Oxff : 0x00; // (q < 0) ? Oxff 

paddb q, q : q = q + q; // q + q 

pand c, y : y = y & Oxld; 

pxor y, q : q = q ~ y; 



0x00 

: 0x00) k Oxld 
0x00) & Oxld) 



// ((q < 0) ? Oxff 

// g.q = (q + q) ~ 

// (((q < 0) ? Oxff 

movdqa d[i], d : d = d[i] // d[i] 

pxor d, q : q = d ' q; // d[i] " p 

pxor d, p : p = d~p; // d[i] "g.q 

We can gain a further increase in speed by partially unrolling the 'for' loop around the inner loop. 
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4.3. Reconstruction. Wc consider a situation tliat two data disks and Dy have failed. We must 



reconstruct and Dy from the remaining data disks Di [i ^ x, y) and the parity disks P and Q, see p3 
Let us define Pxy and Qxy as the syndromes under an assumption that the failed disks were zero: 

(18) = X! ^^2' ^ X! 
Rewriting ^ in the light of (0, 

(19) Dx + Dy = P + Pxy, g^Dx+gyDy = Q + Qxy. 
Let us define 

(20) A=(l+5'^--)-\ i? = 5"-(l+5^--)-i. 
Now we eliminate from equations (|19p : 

(21) i?^, = (1 + ff^"")"'(P + Pxj,) + g""(i + g'^-n-HQ + Q.y) = ^(P + P^y) + B{Q + Q^y). 

Finally, is computed from Dy by the back substitution into ([Tl 

(22) Dx = Dy + {P + Pxy). 



(23) 



4.4. Linux Kernel Implementation of Reconstruction. We compute the following values in F: 

A = {i + gy-^)-\ B =g-^{i + gy-n-' ^{g"" + g'r\ 

Dy=A{P + Pxy)+B{Q + Qxy), Dx =Dy + {P + Pxy). 

It is worth pointing out that for specific x and y, we only need to compute A and B once. The Linux kernel 
provides the following look-up tables: 

raid6_gf mul [256] [256] : xy raid6_gf exp [256] : 5^ 

raid6_gfinv[256] : x'^ raid6^f exi [256] : {1 + g'^)''^ 

Using this, we compute A and B as follows: 

A = raid6_gf exi [y-x] and B = raid6^f inv [raid6_gf exp [x] ~ raid6^f exp [y] ] 

To reconstruct D^ and Dy we start by constructing P^y and Qxy using the standard syndrome code. Then 
we execute the following code. 



// P + P, 



xy 



dP = P " Pxy; 
dQ = q " Qxy; // Q + Qxy 

Dy = raid6_gf mul [A] [dP] " raid6_gf mul [B] [dQ] ; 

Dx = Dy " dP; // Dy + {P + Pxy) 



11 A{P + Pxy) + B{Q 



Ixy 



4.5. Z17-RAID Implementation. 

"0 

1 

10 

10 

1 

1 























(24) 



yo 




yi 




y2 




2/3 




2/4 




2/5 




ye 




yi 




ys 
2/9 




2/10 




2/11 




2/12 




2/13 




2/14 




.2/15. 





The multiplication by the 

00000000 
00000000001 
00000000001 
00000000001 
00000000001 
00000000001 
10000000001 
01000000001 
00100000001 
00010000001 
00001000001 
00000100001 
00000010001 
00000001001 
00000000101 
00000000011 



Sylvester matrix g looks like 
ll 





Xq 




Xl5 




Xl 




Xo © Xi5 




Xl 




Xl ® Xi5 




2^3 




X2 ® Xi5 




X4 




X3 ® Xi5 




X5 




X4 ® Xi5 




Xq 




X5 ® Xi5 




X-! 




xe ® xi5 




Xs 




X7 © Xi5 




Xg 




Xs © Xi5 




XlO 




Xg © Xi5 




Xii 




Xw © 2:^15 




X12 




Xii ®Xi5 




Xl3 




x\i ©a;i5 




Xi4 




xi3 ©a;i5 




_Xl5_ 




X\4 © X\^ 
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We implement the multiplication of a double byte y — gx as follows: 
intl6_t X, y; y = (x + x) ' ((x < 0) ? Oxffff : 0x0000); 

We can implement this in assembler using the following seven instructions. 

// y " y = 
0x0000; // (q < 0) ? Oxffff 
// q + q 



pxor y, y 
pcmpgtw q, y 
paddw q, q 
pxor y, q 



y = y y; 

y = (q < 0) ? Oxffff 

q = q + q; 



0x0000 



movdqa d[i] 
pxor d, q 
pxor d, p 



d 

q 

P 



q y; 

d[i] 
d ~ q; 
d ~ p; 



// g.q = (q + q) " 

// ((q < 0) ? Oxffff 

// d[i] 

// d[i] " p 

// d[i] " g.q 



0x0000) 



Below arc the results of the Linux kernel RAID-6 algorithm selection programs, aimed to select the fastest 
implementation of the algorithm. Algorithms using CPU/MMX/SSE/SSE2 instructions with various levels 
of unrolling are compared. The results were obtained from a 2.8 GHz Intel Pentium 4 (x86). 





DP-RAID 


Z17-RAID 


int32xl 


694 MB/s 


766 MB/s 


int32x2 


939 MB/s 


854 MB/s 


int32x4 


635 MB/s 


838 MB/s 


int32x8 


505 MB/s 


604 MB/s 


mmxxl 


1893 MB/s 


2117 MB/s 


mmxx2 


2025 MB/s 


2301 MB/s 


sselxl 


1200 MB/s 


1284 MB/s 


sselx2 


2000 MB/s 


2263 MB/s 


sse2xl 


1850 MB/s 


2357 MB/s 


sse2x2 


2702 MB/s 


3160 MB/s 



Comparing the above results against the standard Linux kernel results shows an average of 14.5% speed 
increase and an increase of 16.9% for the fastest sse2x2 implementation. This is consistent with the 
theoretical increase of 14.3% for seven instructions instead of eight instructions. It is worth mentioning 
that no look-up tables have been used to implement Z17-RAID. 



4.6. Zn raid Reconstruction. We need to compute the following matrices and vectors: 



A = {l+gy-n-\ 

Dy = A{P + P,,y) + B{Q + Q.,y), 

We rewrite them as follows: 

z = y — X, AP : 

Dy^{l+g^r^^P +5-^(1 +g^)-iAg, 
Using the standard identities g^^ ~ and (1 + g)' 



(25) 



(26) 



B 

D:r. 



= 9 
= A 



^(l+.9 
+ {P 



Pxy)- 



(27) 



(l + .9^)-i = l + .g2-+/^ 



g ^(1+5-j - ^g 
Consequently, we need to compute 
{l+g')-^AP={l 



17-xi 



--P + Px 
,17-a: 



D, = Dy 



AQ = Q + Q^y 
f AP. 

we derive new identities: 



(1 + 9^-' = 9''-^ii + 5^^ + + . . . + g'^. 



(28) 
and 
(29) 



g-+g'~ + .. 
1+5'"(1+5'"(1+5'"(1+5'"(1 



+ giS")AF = 

-g'^'{l+g'%l+g'^-{l+g'^AP))m 



'{l+g^-'AQ 



9 

,17-x 



n-xi 



{l+g'^+g*^ + . 



-i'5^)Ag 



9 

- r ■ + .9'^(l + + 9''{l + 9'\l+9''{l + 9'\l + .9'^(1 + ff^^AQ)))))))) 
Both ([28| and ([29]) require only one principle operation, multiplication by g'^ . 
The multiplication of a single word y = g^x for 1 < fc < 16 can be implemented as follows. 
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intl6_t X, y; 

y = (x « k) ~ (x » (17 - k)) " (((x « (k - 1)) < 0) ? Oxffff : 0x0000); 

We precompute m — k — 1 and n = 17 — A: as they remain constant during reconstruction. This leads to the 
following assembler implementation. 

pxor y, y : y = movdqa x, z : z = x 

psllw m, z : z = X << (k-1) 

pcmpgtw z, y : y = (((x « (k-1)) < 0) ? Oxffff : 0x0000 
paddw z , z : z = X << k 

pxor z, y : y = (((x « (k-1)) < 0) ? Oxffff : 0x0000) " (x « k) 

psrlw n, X : X = X >> (17-k) pxor x, y :y = g~kx 

Below is a table showing benchmark results of complete reconstruction algorithm implemented using SSE2 

assembler and the standard Linux kernel look-up table reconstruction implementation, for the cases of 

double data disk failure, double disk failure of one data disk and the P-parity disk, and double parity disk 

failure. Note the data represents time taken to complete benchmark, so lower is better. 



Failure 



DD DP PQ 



DP-RAID 2917 2771 905 

Z17-RAID 2711 1274 809 
Comparing the complete reconstruction algorithm implemented using SSE2 assembler against the standard 
Linux kernel look-up table implementation, shows approximately 7% speed increase for DD failure, 54% 
speed increase for DP failure and 11% speed increase for PQ failure. 

5. Conclusions. 

In this paper we have demonstrated that cones provide a natural framework for the design of RAID. They 
provide a flexible approach that can be used to design a system. It is worth further theoretical investigation 
what other examples of cones can be constructed or what the maximal possible size of a cone is. 

We have also demonstrated that cyclic groups give rise to natural and convenient to operate examples of 
cones. One particular advantage is that Zjv-RAID does not require support of the Galois field operations. 

On the practical side, Z17-RAID and Z257-RA-ID are breakthrough techniques that show at least 10% 
improvement during simulations compared to DP-RAID. 
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